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ABSTRACT 

We present a method for accelerating the calculation of CMB power spectra, matter power 
spectra and likelihood functions for use in cosmological parameter estimation. The algorithm, 
called CosmoNet, is based on training a multilayer perceptron neural network and shares 
all the advantages of the recently released Pico algorithm of Fendt & Wandelt, but has sev- 
eral additional benefits in terms of simplicity, computational speed, memory requirements 
and ease of training. We demonstrate the capabilities of CosmoNet by computing CMB 
power spectra over a box in the parameter space of flat ACDM models containing the 3o 
WMAPl confidence region. We also use CosmoNet to compute the WMAP3 likelihood 
for flat ACDM models and show that marginalised posteriors on parameters derived are very 
similar to those obtained using CAME and the WMAP3 code. We find that the average er- 
ror in the power spectra is typically 2 — 3% of cosmic variance, and that CosmoNet is 
^ 7 X 10"^ faster than CAMB (for flat models) and ^ 6 x 10*' times faster than the official 
WMAP3 HkeHhood code. CosmoNet and an interface to COSMOMC are publically avail- 
able at www.mrao . cam. ac . uk/ software/ cosmonet. 
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1 INTRODUCTION 

In the analysis of increasingly high-precision data sets, it is now 
common practice in cosmology to constrain cosmological param- 
eters using sampling based methods, most notably Markov chain 
Monte Carlo (MCMC) techniques (Christensen et al. 2001; Knox, 
Christensen & Skordis 2001; Lewis & Bridle 2002). This approach 
typically requires one to calculate theoretical CMB power spectra 
(i.e. some subset of the TT, TE, EE and BB C( spectra) and/or the 
matter power spectrum P(k) at a large number of points (typically 
~ 10^ or more) in the cosmological parameter space. In addition, 
one must also evaluate at each point the corresponding (combined) 
likelihood function for the data set(s) under consideration. As a re- 
sult, the process can be computational very demanding. 

The purist would calculate the required power spectra at each 
point using codes such as CMBfast (Seljak & Zaldarriaga 1996) 
or CAMB (Lewis, Challinor & Lasenby 2000), which typically re- 
quire around 10 sees for spatially-flat models and 50 sees for non- 
flat models. This approach is therefore computationally demand- 
ing, but does have the advantage that it is simple to generalise 
if one wishes to include new physics or change the form of the 
initial power spectra. MCMC parameter estimation codes such as 
CoSMOMC (Lewis & Bridle 2002) attempt to decrease the over- 
all computational burden by dividing the cosmological parameter 
space into 'fast' parameters (governing the initial primoridal power 
spectra of scalar and tensor perturbations) and 'slow' parameters 
(governing the perturbation evolution) and making judicious pro- 
posals for how the chain is propagated in parameter space. Even 



with this technique, however, the total computational cost is still 
usually very high. 

If one is willing to forego the full calculation of the required 
power spectra at each point in parameter space, there are a number 
of ways in which suitably accurate spectra can be generated some- 
what more rapidly. If the cosmological parameter space of interest 
is sufficiently small, then it is possible simply to create spectra for a 
regular grid of models in parameter space and interpolate between 
them in some way. As the number of parameters increases, how- 
ever, the computational cost of constructing the grid grows expo- 
nentially. Fast grid generation schemes have been proposed, such 
as the ^-splitting scheme of Tegmark & Zaldarriaga (2000) that ex- 
ploits analytic approximations at high-i" and insensitivity to certain 
parameters at low-f . Nevertheless, the pre-compute of the grid of 
models remains extremely time-consuming and such approaches 
become difficult to implement accurately when second-order ef- 
fects such as gravitational lensing are important. 

More extensive use of analytic and semi-analytic approxima- 
tions can reduce the required number of pre-computed models, but 
only at the cost of a loss of accuracy and/or placing restrictions on 
the parameters that are available as input. Such approaches are usu- 
ally based on a relatively sparse grid of base models in the parame- 
ter space from which the spectra of more general models are com- 
puted rapidly on-the-fly using various (semi-)analytic approxima- 
tions. The DASH code of Kaplinghat, Knox & Skordis (2002), in- 
stead stores a sparse grid of transfer functions (rather than C/ ), uses 
efficient choices for grid parameters and makes considerable use 
of analytic approximations. Following ~ 40 hrs of computation on 
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a typical desktop to calculate the grid, DASh provides a speed-up 
factor of ~ 30 relative to CMBfast in calculating a CP , Cj^ or Cp 
spectrum. More recently, the need to pre-compute a grid of models 
has been removed in the CMBWARP package (Jimenez et al. 2004), 
which builds on the method introduced by Kosowsky, Milosavlje- 
vic & Jimenez (2002). In this approach, a new set of nearly uncor- 
related 'physical parameters' are introduced upon which the CMB 
power spectra have a simple dependence. CMBWARP uses a mod- 
ified polynomial fit in these parameters in which the coefficients 
are based on the spectra Cj^ , Cj^ or C^^ for just a single fidu- 
cial model in the parameter space. Spectra for other models can 
then be calculated around ~ 3000 times faster than CMBFAST. By 
taking the fiducial model to be the best-fit model to the WMAPl 
data, CMBWARP gives better than 0.5 per cent accuracy for the 
Cj^ spectrum throughout the entire region of parameter space ly- 
ing within the WMAPl 3CT confidence region, although the accu- 
racy quickly reduces as one moves further away from the fiducial 
model. 

Although the above methods have proved extremely useful in 
performing cosmological parameter estimation, they do exhibit a 
number of drawbacks, as we have outlined. Most recently, this has 
led Fendt & Wandelt (2006) to propose a more flexible and ro- 
bust machine-learning approach (called PiCO) to accelerating both 
power spectra and likelihood evaluations. In this method, one first 
calculates the required spectra (usually Cj^, Cj^ or C^^) using 
CAMB and the corresponding likelihoods for the experiments of 
interest (in particular WMAP3) at --^ 10"* points chosen uniformly 
within a box in parameter space that encompasses (say) the 3c con- 
fidence region of the WMAP3 likelihood. This constitutes the train- 
ing set for the PiCO code - note that only power spectra values at 
the limited number of ^-values output by CAMB are used (typi- 
cally 50 values for £tnax = 1500). In short, the basic algorithm used 
by Pico consists of three major parts. First, the training set is com- 
pressed using Karhunen-Loeve eigenmodes (essentially a principal 
component analysis) which typically results in a reduction in the di- 
mensionality of the training set by a factor of two. Second, the train- 
ing set is used to divide the parameter space into (~ 100) smaller re- 
gions using a /r-means clustering algorithm (see e.g. MacKay 1997) 
with the goal that all clusters encompass volume of parameter space 
over which the power spectra vary roughly equally. Finally, a (4th 
order) polynomial is fitted within each cluster (by minimising the 
squared error) to provide a local interpolation of the power spec- 
tra within the cluster as a function of cosmological parameters. The 
reason for dividing up the parameter space in the second step is that 
the interpolation method used fails to model accurately the power 
spectra over the entire parameter space. 

The Pico approach provides about the same speed-up in spec- 
trum calculation as CMBWARP (which is an order of magnitude 
faster than DASH), but is an order of magnitude more accurate. It 
also has several other important advantages. First, it is very flexible 
and can easily be applied to the fast calculation of any observables 
relevant to a particular data set, such as scalar, tensor and lensed 
power spectra, transfer functions or even higher-order correlation 
functions. Second, it allows the calculation of such observables 
from an arbitrary number of cosmological models and in any range 
of I (or k) values. Lastly, the algorithm is sufficiently generic to al- 
low the direct fitting of likelihood functions, thereby incorporating 
the functionality of the CMBfit code of Sandvik et al. This last ca- 
pability allows an additional order of magnitude speed-up in cos- 
mological parameter estimation beyond that resulting from faster 
power spectrum calculations, and is particularly important for ex- 



periments such as WMAP3 for which the likelihood calculation is 
very expensive. 

In this letter, we present an independent approach to using 
machine-learning techniques for accelerating both power spectra 
and likelihood evaluations. Our approach is based on training a neu- 
ral network in the form of a 3-layer perceptron. The resulting Cos- 
MONet code shares all the advantages of PiCO, but we believe also 
has some additional benefits in terms of simplicity, computational 
speed, accuracy, memory requirements and ease of training. The 
letter is organised as follows. In Section|2]we give a brief introduc- 
tion to neural networks and our training algorithm. The resulting 
network output is discussed in Section [3] where we investigate the 
accuracy of our approach. The CosmoNet code is then used to 
perform a cosmological parameter estimation from WMAP3 data 
in Section|4] Our conclusions are presented in Section|5] 



2 NEURAL NETWORK INTERPOLATION 

Neural networks are a methodology for computing motivated by 
the parallel architecture of animal brains. They consist of a group of 
interconnected processing elements called neurons that pass simple 
scalar messages between them to process information. Many neural 
networks provide feed-forward maps from a set of input neurons to 
a set of output neurons. For an introduction to feed-forward neu- 
ral networks see Bailer-Jones et al. (2001). They are often used to 
provide empirical models for processes that are too complicated 
to model from theoretical principles. An astrophysical example is 
presented in Vanzella et al. (2004), where photometric redshifts are 
predicted in the HDF-S from an ultra deep multicolour catalogue. 

2.1 Multilayer perceptron networks 

The perceptron (Rosenblatt 1958) is the simplest type of feed- 
forward neuron and maps an input vector x £ 91" to a scalar output 
f{x;w,Q) via 

f{x-w,Q)^Y.'^iXi + Q, (I) 

where {w,} and are the parameters of the perceptron, called the 
'weights' and 'bias' respectively. 

Multilayer perceptron neural networks (MLPs) are a type of 
feed-forward network composed of a number of ordered layers of 
perceptron neurons that pass scalar messages from one layer to the 
next. In the simplest case, the network has two layers: the input 
layer and and the output layer. Each node in the output layer is a 
perceptron and has an activation given by (TJ. In this paper, how- 
ever, we will work with a 3-layer network, which consists of an in- 
put layer, a hidden layer and an output layer, as illustrated in Fig.[T] 
In such a network, the outputs of the nodes in the hidden and output 
layers take the form 

hidden layer: = g(l) /]'' = ^4''^' + '^7''' ^2) 

output layer: yi = gm{f^\ /i*"^ = E'^^f + ■ 0) 

where the index / runs over input nodes, j runs over hidden nodes 
and / runs over output nodes. The functions g' '^ and g^^' are called 
activation functions and are chosen to be bounded, smooth and 
monotonic. In this letter, we use ^^'^(ji:) = tanhx and g'^^(x) = x, 
where the non-linear nature of the former is a key ingredient in 
constructing a viable network. 
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Figure 1. An example of a 3-layer neural network with seven input nodes, 
3 nodes in the hidden layer and five output nodes. Each line represents one 
weight. 

The weights w and biases 6 are the quantities we wish to de- 
termine, which we denote collectively by a. As these parameters 
vary a very wide range of non-linear mappings between the inputs 
and outputs are possible. In fact, according to a 'universal approx- 
imation theorem' (Leshno et al. 1993), a standard multilayer feed- 
forward network with a locally bounded piecewise continuous ac- 
tivation function can approximate any continuous function to any 
degree of accuracy if (and only if) the network's activation function 
is not a polynomial. This result applies when activation functions 
are chosen apriori and held fixed as a varies. Accuracy increase 
with the number in the hidden layer and the above theorem tells us 
we can always choose sufficient hidden nodes to produce any accu- 
racy. Since the mapping from cosmological parameter space to the 
space of CMB power spectra (and WMAP3 likelihood) is known to 
be continuous, a 3-layer MLP with an appropriate choice of activa- 
tion function is an excellent candidate model for the replacement of 
the forward model provided by the CAMB package (and WMAP3 
likelihood code). 

The activation functions act as basic building blocks of non- 
linearity in a neural network model and should be as simple as 
possible. Additionally, the MemSys routines used in training (de- 
scribed below) require derivative information and so they should 
be differentiable. The universal approximation theorem thus moti- 
vates us to choose a monotonic (for simplicity), bounded and differ- 
entiable function that is not a polynomial and we choose the tanh 
function. Of course, this could be replaced by another such func- 
tion, such as the sigmoid function, but the interpolation results will 
be almost identical. 

2.2 Network training 

Let us consider building an empirical model of the CAMB map- 
ping using a 3-layer MLP as described above (a model of the 
WMAP3 likelihood code can be constructed in an analogous man- 
ner). The number of nodes in the input layer will correspond to 
the number of cosmological parameters, and the number in the out- 
put layer will be the number of uninterpolated C/ values output 
by CAMB. A set of training data '£> = {x'*',^'*'} is provided by 
CAMB (the precise form of which is described later) and the prob- 
lem now reduces to choosing the appropriate weights and biases of 
the neural network that best fit this training data. 

As the CAMB mapping is exact, this is a deterministic prob- 
lem, not a probabilistic one. We therefore wish to choose network 
parameters a that minimise the 'error' term X^(a)) on the training 



set given by 

5c'(«) = ^i:E[fP-3'K:^«;«)]'- (4) 

* (■ 

This is, however, a highly non-linear, multi-modal function in many 
dimensions whose optimisation poses a non-trivial problem. De- 
spite the deterministic nature of the problem we use an extension 
of a Bayesian method provided by the MemSys package (Gull & 
Skilling 1999). 

The MemSys algorithm considers the parameters a of the net- 
work to be probabilistic variables with prior probability distribution 
proportional to exp(— a5(a)), where 5(a) is the positive-negative 
entropy functional (Gull & Skilling 1999; Hobson & Lasenby 
1998) and a is considered a hyperparameter of the prior. The vari- 
able a sets the scale over which variations in a are expected, and 
is chosen to maximise its marginal posterior probability. Its value 
is inversely proportional to the standard deviation of the prior. For 
fixed a, the log-posterior is thus proportional to — X^(a) + aS{a). 
For each choice of a there is a solution a that maximises the pos- 
terior. As a varies, the set of solutions a is called the 'maximum- 
entropy trajectory' . We wish to find the maximum of — which is 
the solution at the end of the trajectory where a = 0. It is difficult 
to recover results for a^°o (for large a the solution is found at the 
maximum of the prior) when starting with a result that lies far from 
the trajectory. Thus for practical purposes, it is best to start from 
the point on the trajectory at a = oo and iterate a downwards until 
either a Bayesian a is acheived, or in our deterministic case, a is 
sufficiently small that the posterior is dominated by X^- 

MemSys performs the algorithm using conjugate gradients 
at each step to converge to the maximum-entropy trajectory. The 
required matrix of second derivatives of %^ is approximated using 
vector routines only. This avoids the need for the 0{N^) operations 
required to perform exact calculations, that would be impractical 
for large problems. The application of MemSys to the problem of 
network training allows for the fast efficient training of relatively 
large network structures on large data sets that would otherwise 
be difficult to perform in a useful time-frame. The MemSys algo- 
rithms are described in greater detail in (Gull & Skilling 1999). 



3 RESULTS 

We demonstrate our approach by training networks to replace the 
CAMB package for the evaluation of the CMB power spectra Cj^, 
Cj^ or Cf^ for flat ACDM models within a box in parameter space 
that encompasses the 3cT confidence region of the WMAP 1-year 
likelihood. We also train a network to replace the WMAP 3-year 
likelihood code, however for reasons to be discussed later in this 
section, this interpolation was preformed over a slightly smaller re- 
gion than 3a. We train four seperate networks: one for each CMB 
power spectra and one for the WMAP 3-year likelihood. It is possi- 
ble to provide all spectra and the likelihood from a single network, 
but training speed is increased by keeping them separate. 

The training data for the spectra interpolation is produced in 
a similar way to that used to train the PiCO algorithm. We define 
the same box as Fendt & Wandelt in the 6-dimensional 'physical 
parameter' space of flat ACDM models, which encompasses the 
3cj confidence region of the likelihood determined from WMAP 1- 
year data (Bennett et al. 2003) and other higher resolution CMB 
data. This box is then sampled uniformly to select 2000 models. 
The physical parameters (tOb, (Bcdm. 0> ^^s) are converted back 
to cosmological parameters iCl^, Hcdm. ^^o. &e. "s, ^s) and used as 
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input to CAMB to produce tlie training set of CMB power spectra 
out to i'tnax = 1500 (which corresponds to 50 uninterpolated Q val- 
ues for each spectrum). A further set of 10^ samples were generated 
as testing data. 

Building a training set for the likelihood was complicated by 
errors in the WMAP 3-year likelihood code. Spuriously high like- 
lihoods were observed for some models lying outside of roughly 
2cQ. These spikes in the likelihood surface prevented a reasonable 
interpolation in these areas and so had to be eliminated from the 
training set. In addition sampling uniformly from a 3cs region in 6 
dimensions returned very few samples around the maximum likeli- 
hood point -making an accurate interpolation around the peak un- 
workable. To correct for both of these problems we built our like- 
lihood training set from 5000 samples in parameter space drawn 
from a Gaussian distribution centered on the maximum likelihood 
point (restricted to the box encompassing our parameter priors). 
The covariance matrix of the Gaussian was twice that of the ex- 
pected variance of the cosmological parameters and was found to 
provide sufficient coverage, both for the peaks of the marginalised 
posteriors and their tails. 

A small pre-processing step was used to make the variation in 
the training data of a similar order to the non-linearity present in the 
network activation functions. This involved mapping all inputs and 
outputs linearly so they had zero mean and a variance of one-half. 
Appropriate scaling of the data would be performed by network 
training if this step were omitted, but the speed of training is in- 
creased if the initial values of the weights are closer to their likely 
optimal values. Also, for the TE and EE spectra, a small number 
(2-4) of separate neural networks were trained on separate regions 
of the spectra, and then combined post training to provide a single 
network for each spectra. This step was required to provide 99''^ 
percentile error within those produced by the PiCO algorithm. 

It was found that 50 nodes in the hidden layer for the TT spec- 
tra network, 125 nodes for the TE spectra network, 200 nodes for 
the EE spectra network, and 50 for the likelihood network, were 
sufficient to provide good results. The results of comparing the 
CosmoNet output with CAMB over the testing set are shown in 
Fig.|2] For all but the very low values of / in the EE spectrum, where 
the values of the spectrum and cosmic variance are small, the aver- 
age error is about 2 — 3% of cosmic variance. The 99'*^ percentiles 
are also comfortably below unit cosmic variance. A comparison of 
output of the CosmoNet likelihood network with the WMAP3 
likelihood code over the testing set reveals a mean error of roughly 
0.2 In units close to the peak. 



4 APPLICATION TO COSMOLOGICAL PARAMETER 
ESTIMATION 

To illustrate the usefulness of CosmoNet in cosmological param- 
eter estimation we perform an analysis of the WMAP 3-year TT, 
TE and EE data using COSMOMC in three separate ways: (i) using 
CAMB power spectra and the WMAP3 likelihood code; (ii) using 
CosmoNet power spectra and the WMAP3 Ukelihood code; and 
(iii) using the CosmoNet likelihood. The resulting marginalised 
parameter constraints for each method are shown in Fig. [3] and 
Fig.m and are clearly very similar. 

^ An example point being: Mi = 0.016048, cOca,, = 0.177486,9 = 
1.056867,1 = 0.501029,ns = 1.078956,As = 3.022956 with In-likelihood 
= -5373 




Figure 3. The one-dimensional marginalised posteriors on the cosmological 
parameters within the 6-parameter flat ACDM model comparing: CAMB 
power-spectra and WMAP3 likelihood (red) with CosmoNet power spec- 
tra and WMAP3 likelihood (black). 

In each case 4 parallel MCMC chains were run on Intel Ita- 
nium 2 processors at the COSMOS cluster (SGI Altix 3700) at 
DAMTP, Cambridge. The wall-clock computational time0 required 
to gather ~ 20000 post burn-in MCMC samples was '-^12 hours 
for method (i) (with CAMB further parallelised over 3 additional 
processors per chain, therefore totalling 16 CPUs), 8 hours for 
method (ii) and roughly 35±5 minutes using the interpolated like- 
lihood with method (iii). For comparison, a similar run with the 
Pico code took roughly 55±5 minutes illustrating that it is now 
the remaining sampling calls within CoSMOMC that provides the 
new bottleneck. 



5 DISCUSSION AND CONCLUSIONS 

We have presented a method for accelerating power spectrum and 
likelihood evaluations based on the training of multilayer percep- 
tron neural networks, which we have shown to be fast, robust and 
accurate. Our CosmoNet method shares all the advantages of the 
Pico algorithm of Fendt & Wandelt, achieving similar accuracies on 
both spectra interpolations and cosmological parameter constraints, 
but there are several differences between the two methods that we 
believe give CosmoNet a number of additional benefits, which 
we now discuss. 

Simplicity. Despite requiring the optimisation of a highly non- 
linear multi-dimensional function using MemSys, we consider the 
principal advantage of our method to be the relative simplicity of 
the trained interpolation for the user to implement. CosmoNet 
provides a single simple, closed-form function for each interpola- 
tion over the whole of the parameter space under consideration. 

Memory usage. A neural network with A'in inputs, A'hid nodes 
in the hidden layer, and A'out outputs has (A'in + 1 )^hid + (^hid + 

^ The total CPU time is 4 times longer. 

^ Note both method (iii) and PICO were run within the 3o training region 
of both algorithms so CAMB was never called. 
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Figure 2. Comparison of the performance of CosmoNet versus CAMB for TT, TE and EE power spectra in 6-parameter flat ACDM models. Tlie plots 
shows the average error together with the 95 and 99 percentiles in units of cosmic variance. 
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Figure 4. The one-dimensional marginalised posteriors on the cosmological 
parameters within the 6-parameter flat ACDM model comparing: CAMB 
power-spectra and WMAP3 likelihood (red) with CosmoNet likelihoods 
(black). 

l)A'out ~ A^hid^out parameters. This is far less than in the PiCO ap- 
proach, where the use of clustering and individual interpolations 
for each Q requires far more parameters. In the case of the flat 
ACDM example demonstrated in section [3] we require about 100 
kB of parameter memory for all three power spectra and the like- 
lihood, whereas Pico would require 15 MB. While the memory 
requirements of PiCO will increase with the number of cosmolog- 
ical parameters, this should make little difference to the memory 
requirements of our method, as we have found the required number 
of nodes in the hidden layer does not increase beyond a factor of 2 
for the 1 1 parameter non-flat model parameterised by O-bh^, Slch^, 
Q.li, 9, T, massive neutrino fraction /v, varying equation of state of 
dark energy w, scalar perturbation amplitude and spectral index As, 
ris and tensor modes with amplitude ratio and spectral index R, n, 
(results to appear in a forthcoming paper), thus representing a linear 
rise. 

Speed. The number of calculations required to perform the 
feed-forward network mapping is 2N\^ A'hid + 2A'j,id A'out ~ 
2A'hidM)ut ■ In the example presented in section [3] the calculation 
of the 50 uninterpolated Ci values for each spectrum required ~ 
120 microseconds, whereas each WMAP3 likelihood took ~ 10 
microseconds; this is ~ 25 times faster than PiCO in performing 



the interpolation. Moreover, the CPU requirements of the PiCO in- 
terpolation scales as {^ Y, for PiCO p is 4. 

Ease of training. Training using the MemSys package is al- 
most totally automated and relatively quick. In fact the bottleneck 
in providing appropriate network weights for more complex cos- 
mological models is the calculation of the training data using the 
CAMB package. The networks used in section [3] took around 100 
hours of training (on a standard PC workstation). Additionally, 
MemSys training time scales linearly with the number of network 
nodes and again linearly with the number of entries in the training 
set. However, networks with an accuracy of roughly 2 — 3 times 
worse, that are sufficient to provide good parameter constraints can 
be trained in under an hour using just 1000 training samples. These 
simpler neural networks did not require the 1-splitting performed in 
section[3]and had only 50 nodes in the hidden layer. 



ACKNOWLEDGMENTS 

TA acknowledges a studentship from EPSRC. MB was supported 
by a Benefactors Scholarship at St. John's College, Cambridge and 
an Isaac Newton studentship. This work was conducted in coop- 
eration with SGI/Intel utilising the Altix 3700 supercomputer at 
DAMTP Cambridge supported by HEFCE and PPARC. We thank 
S. Rankin and V. Treviso for their assistance. 



REFERENCES 

Bailer- Jones C, 2001, Automated Data Analysis in Astronomy. 
Narosa Publishing House, New Delhi 
Bennett C. et al, 2003, ApJS, 148, 1 

Christensen N., Meyer R., Knox L., Luey B., 2001, 
Class. Quant. Grav., 18, 2677 

Fendt W., Wandelt B., 2006, ApJ, submitted ( astro-ph/0606709 1 
Gull S.F., Skilling J., 1999, Quantified maximum entropy: Mem- 
Sys 5 users' manual. Maximum Entropy Data Consultants Ltd, 
Royston 

Hinshaw G. et al. 2006, ApJ, submitted ()astro-ph/060345 ll 
Hobson M.P, Lasenby A.N., 1998, MNRAS, 298, 905 
Jimenez R., Verde L., Peiris H., Kosowsky A., 2004, PRD, 70, 
023005 

Kaplinghat M., Knox L., Skordis C, 2002, ApJ, 578, 665 
Kosowsky A., Milosavljevic M., Jimenez R., 2002, PRD, 66, 
063007 

Knox L., Christensen N., Skordis C, 2001, ApJ, 563, L95 

Lewis A., Bridle S., 2002, PRD, 66, 103511 

Lewis A., Challinor A., Lasenby A., 2000, ApJ, 538, 473 



6 



T. Auld et al. 



Leshno M., Ya. Lin V., Pinkus A., Schocken S., 1993, Neural 
Netw., 1993, 6, 861 

MacKay D.J.C., 2003, Information theory, inference and learning 

algorithms. Cambridge University Press 

Page L. et al., 2006, ApJ, submitted ( astro-ph/0603450]l 

Rosenblatt R, 1958, Psychological Review, 65, 386 

Sandvik H.B., Tegmark M., Wang X., Zaldarriaga M., 2004, PRD, 

69, 063005 

Seljak U., Zaldarriaga M., 1996, ApJ, 469, 437 
Spergel D. et al., 2006, ApJ, submitted ( astro-ph/0 603449) 
Tegmark M., Zaldarriaga M., 2000, ApJ, 544, 30 
Vanzella E. et al., 2004, A&A, 423, 761 

This paper has been typeset from a TgX/ MgX file prepared by the 
author. 



